355 research outputs found

    PasMoQAP: A Parallel Asynchronous Memetic Algorithm for solving the Multi-Objective Quadratic Assignment Problem

    Full text link
    Multi-Objective Optimization Problems (MOPs) have attracted growing attention during the last decades. Multi-Objective Evolutionary Algorithms (MOEAs) have been extensively used to address MOPs because are able to approximate a set of non-dominated high-quality solutions. The Multi-Objective Quadratic Assignment Problem (mQAP) is a MOP. The mQAP is a generalization of the classical QAP which has been extensively studied, and used in several real-life applications. The mQAP is defined as having as input several flows between the facilities which generate multiple cost functions that must be optimized simultaneously. In this study, we propose PasMoQAP, a parallel asynchronous memetic algorithm to solve the Multi-Objective Quadratic Assignment Problem. PasMoQAP is based on an island model that structures the population by creating sub-populations. The memetic algorithm on each island individually evolve a reduced population of solutions, and they asynchronously cooperate by sending selected solutions to the neighboring islands. The experimental results show that our approach significatively outperforms all the island-based variants of the multi-objective evolutionary algorithm NSGA-II. We show that PasMoQAP is a suitable alternative to solve the Multi-Objective Quadratic Assignment Problem.Comment: 8 pages, 3 figures, 2 tables. Accepted at Conference on Evolutionary Computation 2017 (CEC 2017

    Entropy-based High Performance Computation of Boolean SNP-SNP Interactions Using GPUs

    Get PDF
    It is being increasingly accepted that traditional statistical Single Nucleotide Polymorphism (SNP) analysis of Genome-Wide Association Studies (GWAS) reveals just a small part of the heritability in complex diseases. Study of SNPs interactions identify additional SNPs that contribute to disease but that do not reach genome-wide significance or exhibit only epistatic effects. We have introduced a methodology for genome-wide screening of epistatic interactions which is feasible to be handled by state-of-art high performance computing technology. Unlike standard software, our method computes all boolean binary interactions between SNPs across the whole genome without assuming a particular model of interaction. Our extensive search for epistasis comes at the expense of higher computational complexity, which we tackled using graphics processors (GPUs) to reduce the computational time from several months in a cluster of CPUs to 3-4 days on a multi-GPU platform. Here, we contribute with a new entropy-based function to evaluate the interaction between SNPs which does not compromise findings about the most significant SNP interactions, but is more than 4000 times lighter in terms of computational time when running on GPUs and provides more than 100x faster code than a CPU of similar cost. We deploy a number of optimization techniques to tune the implementation of this function using CUDA and show the way to enhance scalability on larger data sets.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work was also supported by the Australian Research Council Future Fellowship to Prof. Moscato, by a funded grant from the ARC Discovery Project Scheme and by the Ministry of Education of Spain under Project TIN2006-01078 and mobility grant PR2011-0144. We also thank NVIDIA for hardware donation under CUDA Teaching and Research Center awards

    Graph algorithms for machine learning: a case-control study based on prostate cancer populations and high throughput transcriptomic data

    Get PDF
    Background The continuing proliferation of high-throughput biological data promises to revolutionize personalized medicine. Confirming the presence or absence of disease is an important goal. In this study, we seek to identify genes, gene products and biological pathways that are crucial to human health, with prostate cancer chosen as the target disease. Materials and methods Using case-control transcriptomic data, we devise a graph theoretical toolkit for this task. It employs both innovative algorithms and novel two-way correlations to pinpoint putative biomarkers that classify unknown samples as cancerous or normal. Results and conclusion Observed accuracy on real data suggests that we are able to achieve sensitivity of 92% and specificity of 91%

    Quantifying the regeneration of bone tissue in biomedical images via Legendre moments

    Get PDF
    Artículo publicado en los proceedings del congresoWe investigate the use of Legendre moments as biomarkers for an efficient and accurate classification of bone tissue on images coming from stem cell regeneration studies. Regions of either existing bone, cartilage or new bone-forming cells are characterized at tile level to quantify the degree of bone regeneration depending on culture conditions. Legendre moments are analyzed from three different perspectives: (1) their discriminant properties in a wide set of preselected vectors of features based on our clinical and computational experience, providing solutions whose accuracy exceeds 90%. (2) the amount of information to be retained when using Principal Component Analysis (PCA) to reduce the dimensionality of the problem from 2 to 6 dimensions. (3) the use of the (alpha-beta)-k-feature set problem to identify a k=4 number of features which are more relevant to our analysis from a combinatorial optimization approach. These techniques are compared in terms of computational complexity and classification accuracy to assess the strengths and limitations of the use of Legendre moments for this biomedical image processing application.Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech

    Hierarchical Clustering Using the Arithmetic-Harmonic Cut: Complexity and Experiments

    Get PDF
    Clustering, particularly hierarchical clustering, is an important method for understanding and analysing data across a wide variety of knowledge domains with notable utility in systems where the data can be classified in an evolutionary context. This paper introduces a new hierarchical clustering problem defined by a novel objective function we call the arithmetic-harmonic cut. We show that the problem of finding such a cut is -hard and -hard but is fixed-parameter tractable, which indicates that although the problem is unlikely to have a polynomial time algorithm (even for approximation), exact parameterized and local search based techniques may produce workable algorithms. To this end, we implement a memetic algorithm for the problem and demonstrate the effectiveness of the arithmetic-harmonic cut on a number of datasets including a cancer type dataset and a corona virus dataset. We show favorable performance compared to currently used hierarchical clustering techniques such as -Means, Graclus and Normalized-Cut. The arithmetic-harmonic cut metric overcoming difficulties other hierarchal methods have in representing both intercluster differences and intracluster similarities

    Uncovering Molecular Biomarkers That Correlate Cognitive Decline with the Changes of Hippocampus' Gene Expression Profiles in Alzheimer's Disease

    Get PDF
    Background: Alzheimer’s disease (AD) is characterized by a neurodegenerative progression that alters cognition. On a phenotypical level, cognition is evaluated by means of the MiniMental State Examination (MMSE) and the post-morten examination of Neurofibrillary Tangle count (NFT) helps to confirm an AD diagnostic. The MMSE evaluates different aspects of cognition including orientation, short-term memory (retention and recall), attention and language. As there is a normal cognitive decline with aging, and death is the final state on which NFT can be counted, the identification of brain gene expression biomarkers from these phenotypical measures has been elusive. Methodology/Principal Findings: We have reanalysed a microarray dataset contributed in 2004 by Blalock et al. of 31 samples corresponding to hippocampus gene expression from 22 AD subjects of varying degree of severity and 9 controls. Instead of only relying on correlations of gene expression with the associated MMSE and NFT measures, and by using modern bioinformatics methods based on information theory and combinatorial optimization, we uncovered a 1,372-probe gene expression signature that presents a high-consensus with established markers of progression in AD. The signature reveals alterations in calcium, insulin, phosphatidylinositol and wnt-signalling. Among the most correlated gene probes with AD severity we found those linked to synaptic function, neurofilament bundle assembly and neuronal plasticity. Conclusions/Significance: A transcription factors analysis of 1,372-probe signature reveals significant associations with the EGR/KROX family of proteins, MAZ, and E2F1. The gene homologous of EGR1, zif268, Egr-1 or Zenk, together with other members of the EGR family, are consolidating a key role in the neuronal plasticity in the brain. These results indicate a degree of commonality between putative genes involved in AD and prion-induced neurodegenerative processes that warrants further investigation

    A Kernelisation Approach for Multiple d-Hitting Set and Its Application in Optimal Multi-Drug Therapeutic Combinations

    Get PDF
    Therapies consisting of a combination of agents are an attractive proposition, especially in the context of diseases such as cancer, which can manifest with a variety of tumor types in a single case. However uncovering usable drug combinations is expensive both financially and temporally. By employing computational methods to identify candidate combinations with a greater likelihood of success we can avoid these problems, even when the amount of data is prohibitively large. Hitting Set is a combinatorial problem that has useful application across many fields, however as it is NP-complete it is traditionally considered hard to solve exactly. We introduce a more general version of the problem (α,β,d)-Hitting Set, which allows more precise control over how and what the hitting set targets. Employing the framework of Parameterized Complexity we show that despite being NP-complete, the (α,β,d)-Hitting Set problem is fixed-parameter tractable with a kernel of size O(αdkd) when we parameterize by the size k of the hitting set and the maximum number α of the minimum number of hits, and taking the maximum degree d of the target sets as a constant. We demonstrate the application of this problem to multiple drug selection for cancer therapy, showing the flexibility of the problem in tailoring such drug sets. The fixed-parameter tractability result indicates that for low values of the parameters the problem can be solved quickly using exact methods. We also demonstrate that the problem is indeed practical, with computation times on the order of 5 seconds, as compared to previous Hitting Set applications using the same dataset which exhibited times on the order of 1 day, even with relatively relaxed notions for what constitutes a low value for the parameters. Furthermore the existence of a kernelization for (α,β,d)-Hitting Set indicates that the problem is readily scalable to large datasets

    Iteratively refining breast cancer intrinsic subtypes in the METABRIC dataset

    Get PDF
    BACKGROUND: Multi-gene lists and single sample predictor models have been currently used to reduce the multidimensional complexity of breast cancers, and to identify intrinsic subtypes. The perceived inability of some models to deal with the challenges of processing high-dimensional data, however, limits the accurate characterisation of these subtypes. Towards the development of robust strategies, we designed an iterative approach to consistently discriminate intrinsic subtypes and improve class prediction in the METABRIC dataset. FINDINGS: In this study, we employed the CM1 score to identify the most discriminative probes for each group, and an ensemble learning technique to assess the ability of these probes on assigning subtype labels using 24 different classifiers. Our analysis is comprised of an iterative computation of these methods and statistical measures performed on a set of over 2000 samples. The refined labels assigned using this iterative approach revealed to be more consistent and in better agreement with clinicopathological markers and patients' overall survival than those originally provided by the PAM50 method. CONCLUSIONS: The assignment of intrinsic subtypes has a significant impact in translational research for both understanding and managing breast cancer. The refined labelling, therefore, provides more accurate and reliable information by improving the source of fundamental science prior to clinical applications in medicine

    Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials

    Full text link
    In Artificial Intelligence we often seek to identify an unknown target function of many variables y=f(x)y=f(\mathbf{x}) giving a limited set of instances S={(x(i),y(i))}S=\{(\mathbf{x^{(i)}},y^{(i)})\} with x(i)∈D\mathbf{x^{(i)}} \in D where DD is a domain of interest. We refer to SS as the training set and the final quest is to identify the mathematical model that approximates this target function for new x\mathbf{x}; with the set T={x(j)}⊂DT=\{ \mathbf{x^{(j)}} \} \subset D with T≠ST \neq S (i.e. thus testing the model generalisation). However, for some applications, the main interest is approximating well the unknown function on a larger domain D′D' that contains DD. In cases involving the design of new structures, for instance, we may be interested in maximizing ff; thus, the model derived from SS alone should also generalize well in D′D' for samples with values of yy larger than the largest observed in SS. In that sense, the AI system would provide important information that could guide the design process, e.g., using the learned model as a surrogate function to design new lab experiments. We introduce a method for multivariate regression based on iterative fitting of a continued fraction by incorporating additive spline models. We compared it with established methods such as AdaBoost, Kernel Ridge, Linear Regression, Lasso Lars, Linear Support Vector Regression, Multi-Layer Perceptrons, Random Forests, Stochastic Gradient Descent and XGBoost. We tested the performance on the important problem of predicting the critical temperature of superconductors based on physical-chemical characteristics.Comment: Submitted to IEEE Transactions on Artificial Intelligence (TAI
    • …
    corecore